Description

This displays the resulting filled images calculated using the fill_gaps.R script.

Different parameters were tested on the following data (note there are 2 different weeks, one with good weekly coverage and one without):

Region: Northwest Atlantic (NWA, 39 to 82 N, 42 to 95 W)  
Sensor: MODIS   
Resolution: 4km   
Processing level: Level 3, binned (L3b)  
Year: 2015  
Weeks: 9, 22  
Pixels outside 0-64 mg m^-3 removed  
Days with < 5% coverage removed  

ImputeEOF removes randomly sampled valid pixels for cross-validation. The number of pixels used is the maximum of 30, or 10% of the pixels. The function continues adding EOFs and calculating the resulting RMSE between real and reconstructed cross-validation pixels until the difference between the current RMSE and RMSE of the previous iteration is below a certain threshold (i.e. adding the most recent EOF did not significantly improve the RMSE). The threshold, called the “tolerance”, is different depending on whether you’re filling data in linear space or in log space, since a log RMSE will be only a fraction of the size of a linear RMSE:

Tolerance for filling logged data: 0.001
Tolerance for filling linear data: 0.01

We start by using a year of data to fill the gaps, and compare different methods below. Then, using the best options, we’ll try using a longer time series.

For each method of filling gaps, we’ll examine the following:

The linear regression uses the standard major axis method (SMA) from lmodel2::lmodel2(), since it minimizes the area of the triangle instead of the distance in the x or y direction alone (i.e. it assumes there is error in both the independent and dependent variables, the “real” and filled/reconstructed data).

Also note that for the tests that involve filling an 8day composite, in situ matchups should be interpreted with caution because of the long temporal bin and the changes that could occur in concentrations and patterns within that time span.

An analysis of DINEOF on the Canadian Pacific coast:
Hilborn A, Costa M. Applications of DINEOF to Satellite-Derived Chlorophyll-a from a Productive Coastal Region. Remote Sensing. 2018; 10(9):1449. https://doi.org/10.3390/rs10091449

8day vs daily

Chla algorithm: OCx
Logged/linear data: Logged

Which is better - filling the gaps in 8day data, or filling gaps in daily data and then averaging it into an 8day image?

Although some R^2 metrics are higher for the daily filled version, and the RMSE for the total series and the week with good percent coverage are slightly lower, overall the 8day cross-validation data has a better fit and less bias (e.g. it identifies some patterns of higher concentration better than the daily fill), and gives a better reconstruction for weeks with poor percent coverage.

8day

Number of EOF: 5 
 Total RMSE: 0.2224261 
 Week 9 RMSE: 0.2039704 
 Week 22 RMSE: 0.1992027

Daily

Number of EOF: 13 
 Total RMSE: 0.2024255 
 Week 9 RMSE: 0.2299895 
 Week 22 RMSE: 0.1829266

OCx vs POLY4

Temporal binning: 8day
Logged/linear data: Logged

Should the OCx or POLY4 algorithm be used? Note that POLY4 has shown to remove some of the bias in the NWA.
OCx = global band-ratio
POLY4 = regional band-ratio, tuned to NWA

Although the POLY4 algorithm increases the RMSE, it also appears to remove some of the bias and provide a tighter fit around the 1:1 line of the CV regression, as well as improving the fit with the in situ matchups. POLY4 was tuned to remove the bias in the NWA that was present when using the OCx algorithm, creating a steeper gradient in chla concentration, which might explain the increase in RMSE as the higher range of chla could be harder to reconstruct.

OCx

Number of EOF: 5 
 Total RMSE: 0.2224261 
 Week 9 RMSE: 0.2039704 
 Week 22 RMSE: 0.1992027

POLY4

Number of EOF: 6 
 Total RMSE: 0.257584 
 Week 9 RMSE: 0.2671343 
 Week 22 RMSE: 0.2588712

Log vs linear

Temporal binning: 8day
Chla algorithm: POLY4

Should we use logged data or linear data to fill the gaps?
Note the process for the log option:

(Note that the RMSE is smaller when fitting logged data since it was calculated in log space)

Logged data gives a smoother fill and better R^2 in the CV regressions as it is not negatively impacted by isolated spikes over relatively low and consistent concentrations.

Log

Number of EOF: 6 
 Total RMSE: 0.257584 
 Week 9 RMSE: 0.2671343 
 Week 22 RMSE: 0.2588712

Linear

Number of EOF: 5 
 Total RMSE: 1.806879 
 Week 9 RMSE: 0.7463928 
 Week 22 RMSE: 1.298033

Longer time series

If more satellite images are used in the algorithm, will it improve the results?

Hilborn and Costa (2018) found that pixel reconstruction improved with more data in a smaller region on the Canadian Pacific coast. Up until this point we have only used one year of data to fill the gaps, but here we’ll try adding more (an equal number of years on either side of the target year, 2015).

Note that the 3year/5year DINEOF runs use the same cross-validation pixels for 2015 with extra randomly-selected pixels from the remaining years. Also, the CV regression below is performed using only the CV pixels for 2015 to give a more accurate comparison between methods.

Overall, expanding the time series gives a slight improvement to the results, most notably when using 3 years instead of a single year. Based on the RMSE summary plot at the bottom, a time series of 7 to 9 years could be used to get the optimal results, but the smaller decrease in RMSE with every added year might not be worth the extra processing time.

1 year

Number of EOF: 6 
 Total RMSE: 0.257584 
 Week 9 RMSE: 0.2671343 
 Week 22 RMSE: 0.2588712

3 years

Number of EOF: 11 
 Total RMSE: 0.2314012 
 Week 9 RMSE: 0.2522208 
 Week 22 RMSE: 0.2433103

5 years

Number of EOF: 13 
 Total RMSE: 0.2248182 
 Week 9 RMSE: 0.2504867 
 Week 22 RMSE: 0.2395364

7 years

Number of EOF: 15 
 Total RMSE: 0.2219548 
 Week 9 RMSE: 0.2450302 
 Week 22 RMSE: 0.2332346

9 years

Number of EOF: 17 
 Total RMSE: 0.2217943 
 Week 9 RMSE: 0.2449905 
 Week 22 RMSE: 0.2280072

Summary

Number of EOFs for 1/3/5/7/9 years: 6/11/13/15/17